Comprehending Real Numbers: Development of Bengali Real Number Speech Corpus

نویسندگان

  • Md Mahadi Hasan Nahid
  • Md. Ashraful Islam
  • Bishwajit Purkaystha
  • Md Saiful Islam
چکیده

Speech recognition has received a less attention in Bengali literature due to the lack of a comprehensive dataset. In this paper, we describe the development process of the first comprehensive Bengali speech dataset on real numbers. It comprehends all the possible words that may arise in uttering any Bengali real number. The corpus has ten speakers from the different regions of Bengali native people. It comprises of more than two thousands of speech samples in a total duration of closed to four hours. We also provide a deep analysis of our corpus, highlight some of the notable features of it, and finally evaluate the performances of two of the notable Bengali speech recognizers on it.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web-Based Bengali News Corpus for Lexicon Development and POS Tagging

Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing (NLP) applications. The rapid development of these resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. We have used a Bengali news corpus, developed from the web archive of a widely read Bengali newspaper. The ...

متن کامل

Lexicon Development and POS Tagging Using a Tagged Bengali News Corpus

Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing(NLP) application areas. The rapid development of these resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. A tagged Bengali news corpus has been developed from the web archive of a widely read Bengali newspaper...

متن کامل

Some properties of fuzzy real numbers

In the mathematical analysis, there are some theorems and definitions that established for both real and fuzzy numbers. In this study, we try to prove  Bernoulli's inequality in fuzzy real numbers with some of its applications. Also, we prove two other theorems in fuzzy real numbers which are proved before, for real numbers.

متن کامل

Connected Digit Recognition Experiments with the OGI Toolkit's Neural Network and HMM-Based Recognizers

This paper describes a series of experiments that compare different approaches to training a speakerindependent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the Hidden Markov Model (HMM) and Neural Network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given. The CSLU Toolkit is a research and development softwa...

متن کامل

A Hybrid Model for Part-of-Speech Tagging and its Application to Bengali

— This paper describes our work on Bengali Part of Speech (POS) tagging using a corpus-based approach. There are several approaches for part of speech tagging. This paper deals with a model that uses a combination of supervised and unsupervised learning using a Hidden Markov Model (HMM). We make use of small tagged corpus and a large untagged corpus. We also make use of Morphological Analyzer. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018